36 research outputs found
Neural Based Statement Classification for Biased Language
Biased language commonly occurs around topics which are of controversial
nature, thus, stirring disagreement between the different involved parties of a
discussion. This is due to the fact that for language and its use,
specifically, the understanding and use of phrases, the stances are cohesive
within the particular groups. However, such cohesiveness does not hold across
groups.
In collaborative environments or environments where impartial language is
desired (e.g. Wikipedia, news media), statements and the language therein
should represent equally the involved parties and be neutrally phrased. Biased
language is introduced through the presence of inflammatory words or phrases,
or statements that may be incorrect or one-sided, thus violating such
consensus.
In this work, we focus on the specific case of phrasing bias, which may be
introduced through specific inflammatory words or phrases in a statement. For
this purpose, we propose an approach that relies on a recurrent neural networks
in order to capture the inter-dependencies between words in a phrase that
introduced bias.
We perform a thorough experimental evaluation, where we show the advantages
of a neural based approach over competitors that rely on word lexicons and
other hand-crafted features in detecting biased language. We are able to
distinguish biased statements with a precision of P=0.92, thus significantly
outperforming baseline models with an improvement of over 30%. Finally, we
release the largest corpus of statements annotated for biased language.Comment: The Twelfth ACM International Conference on Web Search and Data
Mining, February 11--15, 2019, Melbourne, VIC, Australi
CAUSES AND CONSEQUENCES OF DOMESTIC VIOLENCE, SOCIO-CULTURAL DIFFERENCES IN KOSOVO
In this paper we are trying to identify causes and consequences of domestic violence in Kosovo. As one of the country in which society is undergoing through the radical transition, Kosovo is faced with different challenges in order to build a state where social and political rights are equal for everyone without taking into consideration genders differences, ages, religion, race, political orientation, language, etc. So far, there have been taken a number of legal responsibilities dealing with domestic violence. Under the pressure of the European Integration, Kosovo has approved the national program against domestic violence, law on the family, Law on Protection from Domestic Violence, different strategy with the international support, and also have an active role of nongovernmental organizations in advocating the gender-based equalities. Domestic violence as a social
phenomenon is deeply elaborated by different social scholars as an act that violates human rights and that all human beings are free and with equal rights and dignity. In this paper we will discuss the official data related to domestic violence in Kosovo, going through the cases from deaths, suicide, to child abuse, disturbance, disagreement and
different variables. Also, we will explain how is defined the domestic violence in Kosovo, from the dimension of physical abuse to the economic abuse. The main part of this article is analyzing official data from the studies, safety agencies such as: police and justice, and
also nongovernmental organizations related to this issue. The aim of the state institutions is to prevent domestic violence, but how is the real situation in the field? Do they protect and secure the victims? Do they offer training and reintegration of the victims? However
those data bring us into line with the real situation of domestic violence in Kosovo, regardless of the different perceptions of this phenomenon in our society
Improving Entity Retrieval on Structured Data
The increasing amount of data on the Web, in particular of Linked Data, has
led to a diverse landscape of datasets, which make entity retrieval a
challenging task. Explicit cross-dataset links, for instance to indicate
co-references or related entities can significantly improve entity retrieval.
However, only a small fraction of entities are interlinked through explicit
statements. In this paper, we propose a two-fold entity retrieval approach. In
a first, offline preprocessing step, we cluster entities based on the
\emph{x--means} and \emph{spectral} clustering algorithms. In the second step,
we propose an optimized retrieval model which takes advantage of our
precomputed clusters. For a given set of entities retrieved by the BM25F
retrieval approach and a given user query, we further expand the result set
with relevant entities by considering features of the queries, entities and the
precomputed clusters. Finally, we re-rank the expanded result set with respect
to the relevance to the query. We perform a thorough experimental evaluation on
the Billions Triple Challenge (BTC12) dataset. The proposed approach shows
significant improvements compared to the baseline and state of the art
approaches
InstructPTS: Instruction-Tuning LLMs for Product Title Summarization
E-commerce product catalogs contain billions of items. Most products have
lengthy titles, as sellers pack them with product attributes to improve
retrieval, and highlight key product aspects. This results in a gap between
such unnatural products titles, and how customers refer to them. It also limits
how e-commerce stores can use these seller-provided titles for recommendation,
QA, or review summarization.
Inspired by recent work on instruction-tuned LLMs, we present InstructPTS, a
controllable approach for the task of Product Title Summarization (PTS).
Trained using a novel instruction fine-tuning strategy, our approach is able to
summarize product titles according to various criteria (e.g. number of words in
a summary, inclusion of specific phrases, etc.). Extensive evaluation on a
real-world e-commerce catalog shows that compared to simple fine-tuning of
LLMs, our proposed approach can generate more accurate product name summaries,
with an improvement of over 14 and 8 BLEU and ROUGE points, respectively.Comment: Accepted by EMNLP 2023 (Industry Track
Answering Unanswered Questions through Semantic Reformulations in Spoken QA
Spoken Question Answering (QA) is a key feature of voice assistants, usually
backed by multiple QA systems. Users ask questions via spontaneous speech which
can contain disfluencies, errors, and informal syntax or phrasing. This is a
major challenge in QA, causing unanswered questions or irrelevant answers, and
leading to bad user experiences. We analyze failed QA requests to identify core
challenges: lexical gaps, proposition types, complex syntactic structure, and
high specificity. We propose a Semantic Question Reformulation (SURF) model
offering three linguistically-grounded operations (repair, syntactic reshaping,
generalization) to rewrite questions to facilitate answering. Offline
evaluation on 1M unanswered questions from a leading voice assistant shows that
SURF significantly improves answer rates: up to 24% of previously unanswered
questions obtain relevant answers (75%). Live deployment shows positive impact
for millions of customers with unanswered questions; explicit relevance
feedback shows high user satisfaction.Comment: Accepted by ACL 2023 Industry Trac